14 research outputs found

    The VIA Annotation Software for Images, Audio and Video

    Full text link
    In this paper, we introduce a simple and standalone manual annotation tool for images, audio and video: the VGG Image Annotator (VIA). This is a light weight, standalone and offline software package that does not require any installation or setup and runs solely in a web browser. The VIA software allows human annotators to define and describe spatial regions in images or video frames, and temporal segments in audio or video. These manual annotations can be exported to plain text data formats such as JSON and CSV and therefore are amenable to further processing by other software tools. VIA also supports collaborative annotation of a large dataset by a group of human annotators. The BSD open source license of this software allows it to be used in any academic project or commercial application.Comment: to appear in Proceedings of the 27th ACM International Conference on Multimedia (MM '19), October 21-25, 2019, Nice, France. ACM, New York, NY, USA, 4 page

    Interactive computer vision through the Web

    Get PDF
    Computer vision is the computational science aiming at reproducing and improving the ability of human vision to understand its environment. In this thesis, we focus on two fields of computer vision, namely image segmentation and visual odometry and we show the positive impact that interactive Web applications provide on each. The first part of this thesis focuses on image annotation and segmentation. We introduce the image annotation problem and challenges it brings for large, crowdsourced datasets. Many interactions have been explored in the literature to help segmentation algorithms. The most common consist in designating contours, bounding boxes around objects, or interior and exterior scribbles. When crowdsourcing, annotation tasks are delegated to a non-expert public, sometimes on cheaper devices such as tablets. In this context, we conducted a user study showing the advantages of the outlining interaction over scribbles and bounding boxes. Another challenge of crowdsourcing is the distribution medium. While evaluating an interaction in a small user study does not require complex setup, distributing an annotation campaign to thousands of potential users might differ. Thus we describe how the Elm programming language helped us build a reliable image annotation Web application. A highlights tour of its functionalities and architecture is provided, as well as a guide on how to deploy it to crowdsourcing services such as Amazon Mechanical Turk. The application is completely opensource and available online. In the second part of this thesis we present our open-source direct visual odometry library. In that endeavor, we provide an evaluation of other open-source RGB-D camera tracking algorithms and show that our approach performs as well as the currently available alternatives. The visual odometry problem relies on geometry tools and optimization techniques traditionally requiring much processing power to perform at realtime framerates. Since we aspire to run those algorithms directly in the browser, we review past and present technologies enabling high performance computations on the Web. In particular, we detail how to target a new standard called WebAssembly from the C++ and Rust programming languages. Our library has been started from scratch in the Rust programming language, which then allowed us to easily port it to WebAssembly. Thanks to this property, we are able to showcase a visual odometry Web application with multiple types of interactions available. A timeline enables one-dimensional navigation along the video sequence. Pairs of image points can be picked on two 2D thumbnails of the image sequence to realign cameras and correct drifts. Colors are also used to identify parts of the 3D point cloud, selectable to reinitialize camera positions. Combining those interactions enables improvements on the tracking and 3D point reconstruction results

    Web-Based Configurable Image Annotations

    Get PDF
    We introduce a new application for annotating images, with the purpose of constituting training datasets for machine learning al-gorithms. Our open-source software is meant to be easily used and deployed, configured to meet the annotation needs of any use case, and embeddable in crowdsourcing campaigns using the Amazon Mechanical Turk service

    Protocols and Software for Simplified Educational Video Capture and Editing

    Get PDF
    Recently, educational videos have become important parts of e-learning systems which have in turn become widely used due to their flexibility. These videos should be of high quality since higher production values lead to superior learning outcomes. However, creating high-quality video is a difficult task for teachers since it needs technical knowledge that includes video recording and timeline usage. Hence, creating educational video production software, that is at the same time easy-to-use and able to produce high-quality educational videos, is very advantageous. In this paper, we developed protocols for an easy-to-use piece of software that enables teachers who have little technological background to produce their own educational videos autonomously. In fact, our contribution is to reduce the complexity of the whole video production process by introducing a preparation step based on micro-teaching and upstream specification. An evaluation of the software with six teachers is performed. This evaluation, based on think-aloud protocol and quantitative measurements, showed that the introduction of the preparation step allowed the participant teachers to produce high-quality educational videos in less than three hours

    Трансформации библеизмов в современной публицистике

    Get PDF
    Tutkielmassa kuvataan bibleisimien eli Raamattuun palautuvien vakiintuneiden sanontojen käyttöä venäläisessä nykylehdistössä. Tarkoituksena on luoda kattava luokittelu bibleismien transformaatioista, joilla tutkielmassa tarkoitetaan mitä tahansa sanonnan muotoon tai merkitykseen tehtyä modifikaatiota ja sen seurauksena syntynyttä muunnosta. Tutkielmassa selvitetään myös mm. sitä, estääkö sanonnan semanttinen motivoimattomuus transformaation toteutumisen. Oletuksena on, että fraseologisten sulaumien kohdalla vain semanttiset transformaatiot ovat mahdollisia. Koska valtaosa bibleismeistä kuuluu fraseologismeihin, on työn teoriaosuudessa tarkasteltu fraseologian teoriaa ja fraseologismien luokitteluperusteita. Fraseologismit on jaettu V. V. Vinogradovin luokittelun perusteella kolmeen ryhmään, fraseologisiin sulaumiin, fraseologisiin yhtymiin ja fraseologisiin yhdistelmiin, joihin on lisätty V. M. anskij n erottama neljäs ryhmä, ns. fraseologisoituneet sanonnat. Työn empiirinen aineisto on kerätty Integrum-tietokannassa olevista neljästä lehdestä (Kommersant, Izvestija, Novye Izvestija ja Nezavisimaja gazeta) siten, että haku on rajattu 1.11.2010 01.05.2011 välisenä aikana ilmestyneisiin numeroihin ja haun kohteena ovat olleet 21 eri bibleismin transformaatiot. Tällä tavoin rajattu haku antoi tuloksena 304 osumaa, joista 77 eli 25,3% edusti transformaatioita. Osa työssä tarkasteltavista esimerkeistä on lisäksi poimittu venäjänkielisiltä Internet-sivuilta ja tutkimuskirjallisuudesta. Tutkimusmetodina on aineistolähtöinen sisällönanalyysi. Kerätty aineisto jakaantui kolmeen ryhmään, leksikaalisiin, semanttisiin ja leksikaalis-semanttisiin transformaatioihin. Semanttisessa transformaatiossa bibleismin merkitys muuttuu sen muodon pysyessä samana. Sen keinoihin kuuluvat bibleismin kirjaimellisen ja kuvainnollisen merkityksen yhtäaikainen aktualisointi, bibleismin käyttö kirjaimellisessa merkityksessä, yksittäisen sanan käyttö irrallaan alkuperäisestä sanonnasta, uuden merkityksen antaminen sekä tyylillinen paradoksi. Leksikaalisessa transformaatiossa bibleismin muoto muuttuu sen merkityksen pysyessä samana. Tätä keinoa edustaa alkuperäisen ilmauksen sanojen korvaaminen toisilla, sanojen lisääminen, sanojen poisjättö ja sanajärjestyksen vaihtaminen. Leksikaalis-semanttisessa transformaatiossa sekä bibleismin muoto että merkitys muuttuvat. Sen keinoja ovat ilmauksen sanojen korvaaminen toisilla sanoilla, kirjainten lisääminen tiettyyn bibleismin sanaan, syntaksinen muunnos, kahden eri sanonnan yhdistäminen sekä vapaan sanaliiton muodostaminen bibleisimiä mukaillen. Analyysi osoitti, että leksikaaliset muutokset ovat mahdollisia myös fraseologisten sulaumien tapauksessa. Fraseologisten yhdistelmien tapauksessa fraseologisesti sidottu sana ottaa usein koko sanonnan merkityksen itselleen, ja merkityksestä tulee näin ollen kyseisen sanan yksi uusi merkitys. Johtopäätös on, että fraseologismin motivoimattomuus ei estä sen transformaatioita. Monet bibleismit sisältävät vanhahtavia kielenaineksia, mistä johtuen useita alun perin tyyliltään juhlallisia bibleismejä käytetään ironian tai parodian välineenä. Bibleismien transformaatioita käytetään lehdistössä runsaasti synnyttämään uusia mielleyhtymiä tai sitomaan bibleismi yhteen artikkelin teeman kanssa

    Vision par ordinateur interactive sur le Web

    No full text
    La vision par ordinateur est un domaine de l'informatique visant à reproduire et à améliorer la capacité de la vision humaine à comprendre son environnement. Dans cette thèse, nous nous concentrons sur deux domaines de la vision par ordinateur, à savoir la segmentation d'image et l'odométrie visuelle. Nous montrons l'impact positif qu'apporte l'usage d'applications Web interactives pour chacun d'eux. La première partie de cette thèse porte sur l'annotation et la segmentation d'images. Nous définissons dans un premier temps le problème de l'annotation d'images et les défis que cela représente pour des grands ensembles de données. De nombreuses interactions ont été utilisées dans la littérature pour aider les algorithmes de segmentation. Les plus courantes consistent à désigner explicitement des contours, dessiner des boîtes englobantes, ou marquer des traits à l'intérieur et à l'extérieur des objets d'intérêt. Dans un contexte de crowdsourcing, les tâches d'annotation sont déléguées à un public non-expert. Pour cette raison, nous avons mené une étude utilisateur montrant les avantages d'une interaction que nous appelons entourage par rapport aux autres types d'interactions. Nous décrivons comment le langage de programmation Elm nous a aidé à construire une application Web d'annotation d'images qui soit fiable. Un tour d'horizon des fonctionnalités et de son architecture est proposé, ainsi qu'un guide pour le déploiement dans des services de microtâches comme Amazon Mechanical Turk. Cette application est entièrement libre et mise à disposition en ligne. Dans la seconde partie de cette thèse, nous présentons notre bibliothèque libre d'odométrie visuelle directe. Nous fournissons une évaluation comparative montrant que notre approche est aussi performante que les alternatives actuellement disponibles. La formulation du problème d'odométrie visuelle repose sur des outils géométriques et des techniques d'optimisation nécessitant une grosse puissance de calcul pour fonctionner à 25 images par seconde. Puisque nous aspirons à exécuter ces algorithmes sur le Web, nous passons en revue les technologies passées et courantes fournissant des bonnes performances directement au sein du navigateur Web. En particulier, nous détaillons comment cibler une nouvelle plateforme appelée WebAssembly à partir des langages de programmation C++ et Rust. Notre bibliothèque a été implémentée entièrement dans le langage de programmation Rust, ce qui en a facilité le portage vers WebAssembly. Cette propriété nous a permis de mettre en place une application Web d'odométrie visuelle proposant différents types d'interactions. Une barre de temps permet une navigation unidimensionnelle le long de la séquence vidéo. Des paires de points peuvent être sélectionnées sur deux images de la séquence pour réaligner les caméras et corriger l'éventuelle dérive. Des couleurs sont également utilisées pour identifier des parties sélectionnables du nuage de points 3D pour réinitialiser les positions de la caméra. La combinaison de ces interactions permet d'apporter des améliorations sur les résultats du suivi et de la reconstruction du nuage de points 3D.Computer vision is the computational science aiming at reproducing and improving the ability of human vision to understand its environment. In this thesis, we focus on two fields of computer vision, namely image segmentation and visual odometry and we show the positive impact that interactive Web applications provide on each. The first part of this thesis focuses on image annotation and segmentation. We introduce the image annotation problem and challenges it brings for large, crowdsourced datasets. Many interactions have been explored in the literature to help segmentation algorithms. The most common consist in designating contours, bounding boxes around objects, or interior and exterior scribbles. When crowdsourcing, annotation tasks are delegated to a non-expert public, sometimes on cheaper devices such as tablets. In this context, we conducted a user study showing the advantages of the outlining interaction over scribbles and bounding boxes. Another challenge of crowdsourcing is the distribution medium. While evaluating an interaction in a small user study does not require complex setup, distributing an annotation campaign to thousands of potential users might differ. Thus we describe how the Elm programming language helped us build a reliable image annotation Web application. A highlights tour of its functionalities and architecture is provided, as well as a guide on how to deploy it to crowdsourcing services such as Amazon Mechanical Turk. The application is completely opensource and available online. In the second part of this thesis we present our open-source direct visual odometry library. In that endeavor, we provide an evaluation of other open-source RGB-D camera tracking algorithms and show that our approach performs as well as the currently available alternatives. The visual odometry problem relies on geometry tools and optimization techniques traditionally requiring much processing power to perform at realtime framerates. Since we aspire to run those algorithms directly in the browser, we review past and present technologies enabling high performance computations on the Web. In particular, we detail how to target a new standard called WebAssembly from the C++ and Rust programming languages. Our library has been started from scratch in the Rust programming language, which then allowed us to easily port it to WebAssembly. Thanks to this property, we are able to showcase a visual odometry Web application with multiple types of interactions available. A timeline enables one-dimensional navigation along the video sequence. Pairs of image points can be picked on two 2D thumbnails of the image sequence to realign cameras and correct drifts. Colors are also used to identify parts of the 3D point cloud, selectable to reinitialize camera positions. Combining those interactions enables improvements on the tracking and 3D point reconstruction results

    Image Processing for Cultural Heritage Accessibility: Digitizing the Bayeux Tapestry

    No full text
    This work showcases image processing and computer vision algorithms, in the light of accessibility to cultural heritage artifacts. We emphasize the potential of multimodal 2D image registration and fine-scale 3D-reconstruction techniques, with the aim to ease the work of historians and museum curators, as well as to make artifacts more accessible to the general public or to visually impaired people. This study focuses on the Bayeux Tapestry, a world-famous medieval wool embroidery included in UNESCO's Memory of the World register, and of fundamental importance for both the scientific community and the general public. This exceptional testimony on the society at the eleventh century is both a singular artwork and a historical source on a major event in the history of medieval Europe. Developing state-of-the-art image processing tools for its digitization will ease not only its access, but also its analysis, inspection and reproduction

    Low-rank registration of images captured under unknown, varying lighting

    No full text
    International audiencePhotometric stereo infers the 3D-shape of a surface from a sequence of images captured under moving lighting and a static camera. However, in real-world scenarios the viewing angle may slightly vary, due to vibrations induced by the camera shutter, or when the camera is hand-held. In this paper, we put forward a low-rank affine registration technique for images captured under unknown, varying lighting. Optimization is carried out using convex relaxation and the alternating direction method of multipliers. The proposed method is shown to significantly improve 3D-reconstruction by photometric stereo on unaligned real-world data, and an open-source implementation is made available

    3D surface Approximation of the Entire Bayeux Tapestry for Improved Pedagogical Access

    No full text
    International audienceThe Bayeux Tapestry is an exceptional cultural heritage masterpiece by its size and the finesse of its details. Digitizing it raises a challenge, knowing that it is extremely fragile and thus lasers or invasive techniques are out of scope. In this work, we address this 3D-reconstruction challenge by introducing a pipeline to generate a high-resolution panorama of the Tapestry's geometry. It is based on a deep learning architecture that converts the RGB images of a pre-existing 2D panorama into a 2.5D normal map panorama. With a view to facilitating the Tapestry inclusive accessibility, we further show that coupling our 3D-reconstruction pipeline with a segmentation method allows the affordable and rapid creation of 3D-printed bas-reliefs, which can be explored tactilely by visually impaired people
    corecore